This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.
library(ggplot2)
library(readr)
library(tidyr)
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
# Read the CSV file
data <- read_csv("../../ProcessedData/final_vegePrice0.csv")
Rows: 132 Columns: 2756── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Commodity, Unit, Category
dbl (2753): 2013-06-16, 2013-06-17, 2013-06-18, 2013-06-19, 2013-06-20, 2013-06-21, 2013-06-25, 2013-06-26, 2013-06-27, 2013-06-28, 2013-06-30, 2013-07-01, 2013-07-02, 2013-07-03, 2013-07-04, ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Convert the data from wide to long format
long_data <- data %>%
pivot_longer(cols = starts_with("20"), names_to = "Date", values_to = "Price") %>%
mutate(Date = as.Date(Date))
# Plotting all commodities
ggplot(long_data, aes(x = Date, y = Price, group = Commodity, color = Commodity)) +
geom_line() +
theme_minimal() +
labs(title = "Price Trends of All Commodities",
x = "Date",
y = "Price") +
theme(legend.position="none")
# Creating multiple plots, one for each category
plots <- long_data %>%
group_by(Category) %>%
do(plot = ggplot(., aes(x = Date, y = Price, group = Commodity, color = Commodity)) +
geom_line() +
theme_minimal() +
labs(title = paste("Price Trends in Category:", unique(.$Category)),
x = "Date",
y = "Price") +
theme(legend.position="none"))
# Saving the plots to a list
plot_list <- lapply(plots$plot, print)
# Counting missing values per commodity
missing_values <- long_data %>%
group_by(Commodity) %>%
summarize(Missing = sum(is.na(Price)))
# Calculate the total number of dates
total_dates <- nrow(long_data) / n_distinct(long_data$Commodity)
# Adding percentage of missing data
missing_values <- missing_values %>%
mutate(PercentMissing = (Missing / total_dates) * 100)
# Print the number of missing values and their percentage
print(missing_values)
NA
print(long_data)
# Filter out commodities with more than 10% missing data
commodities_to_keep <- missing_values %>%
filter(PercentMissing <= 10) %>%
select(Commodity)
# Filter the long_data dataframe to only include the selected commodities
filtered_long_data <- long_data %>%
filter(Commodity %in% commodities_to_keep$Commodity)
print(filtered_long_data)
NA
plots <- filtered_long_data %>%
group_by(Category) %>%
do(plot = ggplot(., aes(x = Date, y = Price, group = Commodity, color = Commodity)) +
geom_line() +
theme_minimal() +
labs(title = paste("Price Trends in Category:", unique(.$Category)),
x = "Date",
y = "Price") +
theme(legend.position="none"))
# Saving the plots to a list
plot_list <- lapply(plots$plot, print)
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.